Identifying cancer patients who received palliative care using the SPICT-LIS in medical records: a rule-based algorithm and text-mining technique

Background Due to limited numbers of palliative care specialists and/or resources, accessing palliative care remains limited in many low and middle-income countries. Data science methods, such as rule-based algorithms and text mining, have potential to improve palliative care by facilitating analysis of electronic healthcare records. This study aimed to develop and evaluate a rule-based algorithm for identifying cancer patients who may benefit from palliative care based on the Thai version of the Supportive and Palliative Care Indicators for a Low-Income Setting (SPICT-LIS) criteria. Methods The medical records of 14,363 cancer patients aged 18 years and older, diagnosed between 2016 and 2020 at Songklanagarind Hospital, were analyzed. Two rule-based algorithms, strict and relaxed, were designed to identify key SPICT-LIS indicators in the electronic medical records using tokenization and sentiment analysis. The inter-rater reliability between these two algorithms and palliative care physicians was assessed using percentage agreement and Cohen’s kappa coefficient. Additionally, factors associated with patients might be given palliative care as they will benefit from it were examined. Results The strict rule-based algorithm demonstrated a high degree of accuracy, with 95% agreement and Cohen’s kappa coefficient of 0.83. In contrast, the relaxed rule-based algorithm demonstrated a lower agreement (71% agreement and Cohen’s kappa of 0.16). Advanced-stage cancer with symptoms such as pain, dyspnea, edema, delirium, xerostomia, and anorexia were identified as significant predictors of potentially benefiting from palliative care. Conclusion The integration of rule-based algorithms with electronic medical records offers a promising method for enhancing the timely and accurate identification of patients with cancer might benefit from palliative care. Supplementary Information The online version contains supplementary material available at 10.1186/s12904-024-01419-1.

Identifying cancer patients who received palliative care using the SPICT-LIS in medical records: a rule-based algorithm and textmining technique

Background
Globally, approximately 40 million individuals require palliative care each year, with 78% living in low-and middle-income countries where palliative care resources, including home ventilators, are limited [1].Only 14% of these patients are estimated to receive appropriate palliative care [1].Several factors have resulted in the limited access to palliative care in these countries.Moreover, resources including palliative care specialists, devices such as oxygen generators and syringe drivers are limited [2].In developing countries, including Thailand, there are also few hospice services available [3], which further exacerbates the accessibility challenges.One suggestion to help alleviate such accessibility challenges is priority screening, so only patients who might benefit from palliative care are offered the service.Experts have challenged this issue by establishing various scoring systems and criteria to help prioritize and select patients [4].
Several tools have been developed to help physicians and/or healthcare teams assess patients who should receive palliative care [5].The Supportive and Palliative Instrument Indicators Tool (SPICT) [6] is the most common tool used to help healthcare professionals identify patients with advanced life-limiting conditions who may benefit from a holistic palliative care program.The SPICT was first published in 2014 and has been used in over 30 countries [6].A modified version of the SPICT, the Supportive and Palliative Care Indicators for a Low-Income Setting (SPICT-LIS), was created in 2019 for use in low-income countries [6] .
In 2021, the SPICT-LIS was translated into Thai, with cross-cultural validation by Sripaew et al. and subsequently tested using real-world data in a retrospective study by Fumaneeshoat et al. [7] using the Thai-translated SPICT-LIS to identify cancer patients who may have benefited from palliative care in Thailand.They found that 7.8% of 9,990 patients with cancer might have qualified for palliative care [8].However, it is challenging to evaluate all cancer patients using an instrument such as the SPICT-LIS to evaluate which patients would benefit from palliative care in real life due to limited resources, healthcare providers, and knowledge.If a simpler, more user-friendly screening tool could be developed, it would be easier for healthcare professionals to identify patients might benefit from palliative care.
Appropriate text-mining techniques and text-based artificial intelligence such as Natural Language Processing could potentially play a very useful role in modern healthcare with its voluminous electronic medical records owning to their ability to extract information from unstructured clinical text data such as medical records and physician notes [9].These techniques, which are powered by algorithms, are designed to' efficiently process and analyze large volumes of textual data [10].
Therefore, this study aimed to create a rule-based algorithm based on regular expressions and sentiment analysis to identify patients who might be benefit from palliative care based on the Thai SPICT-LIS criteria from electronic medical records.The factors and characteristics of patients recommended for palliative care by palliative care specialists were used to develop and adjust the rule-based algorithm to improve the accuracy of the model.

Study design and setting
The electronic medical records, including electronic doctor's notes (eDN) and patient characteristics were extracted from Songklanagarind Hospital, the biggest hospital in Southern Thailand, database.The data scientists of the Division of Digital Innovation and Data Analytics (DIDA), Faculty of Medicine, Prince of Songkla University, supervised by Dr. Ingviya, randomly selected 100 inpatients diagnosed with cancer confirmed by the Cancer Registry and prepared their eDNs and patient characteristics to be reviewed.Two palliative care physicians independently reviewed the medical records of the 100 randomly selected cancer patients between February and June 2022.The records were reviewed using the Thai version of the SPICT-LIS to determine whether palliative care would have been beneficial to the patients.

Data source
The data of the study cancer patients were retrieved from the Cancer Registry of Songklanagarind Hospital, and further documents were queried from the hospital inpatient department data (IPD) prepared by DIDA as mentioned above.The eDN, patient's characteristics and vital signs were extracted and stored using the PostgreSQL Relational Database Management System on a physical server in the DIDA Data Center.The querying and merging of text data were done though the PostgreSQL.

Inclusion/ exclusion criteria
All cancer inpatients aged 18 years or older diagnosed with cancer at Songklanagarind Hospital using the International Classification of Diseases and Related Health Problems 10th Revision Thai Modification (ICD-10 TM) [11] and the International Classification of Disease for Oncology (ICD-O) [12] from 2016 to 2020 were included in the initial study sample.Patients who had a first admission digital note of ≤ 1,000 words following their cancer diagnosis were excluded from the study to ensure that the study records had an adequate amount of the data required to assess the patients using the Thai SPICT-LIS criteria.The patient characteristics data extracted included birth date, sex, religion, ICD-10 and ICD-O, and cancer staging.

Data management and algorithm development Training set
To create an initial training dataset, two palliative care physicians reviewed the whole records of 100 randomly selected patients and assessed if the patients had any of the six general indicators suggesting that they might benefit from palliative care, which were coded as 1 or 0 for patients who might or might not benefit, respectively.When there was disagreement between the two specialists, a consensus was reached by a face-to-face discussion.

Text-mining models
Text-mining models were created to extract essential data from standard language text in both the Thai and English languages eDN data via text mining by using a sequence of characters that formed a search pattern called a tokenization technique [13] (regular expression) [14] with the 'LexTo' package, a package enabling tokenization of the Thai Language in the R program.

Sentiment analysis
Sentiment analysis involves classifying data into categories like positive or negative [15].For instance, the word "pain" might be labeled as negative, whereas the phrase "no pain" could be considered positive.Text segments in the code were passed directly as input to the model.In this study, the sentiment analysis model was trained to categorize the sentiment of a given text into two groups, patients who might be benefit from palliative care and those who might not.

Data dictionaries
A data dictionary encompassing a range of mixed Thai and English words/phrases was created using tokenization and sentiment analysis to classify patients into 2 groups based on whether they satisfied any of the six Thai SPICT-LIS general indicators or not.In general, words/phrases and/or sentences indicating symptoms and patient history were used to determine if the patients had presented with any of the six general indicators.The classification and extraction of each general indicator was performed on the physicians' free-text comments using the mixed language data dictionary.For international readers of our paper, we translated the Thai words/ phrases/sentences in the data dictionary to universally understood English terminology presented side by side with the Thai corresponding words/sentences as detailed in Table S1.

Rule-based algorithms
Two Rule-based algorithms were created based on Regular expression, Tokenization and Sentiment Analysis using the R Program version 4.0.3(R Core Team, Austria) from the whole records written in a mixture of Thai and English words/phrases/sentences.
Strict and relaxed rule-based algorithms were used in this study.Strict-rule-based criteria were defined using a stringent set of criteria for identifying each indicator.The strict algorithm was characterized by its focus on using explicit and well-defined terms, which could have led to fewer cases meeting the criteria.In contrast, the relaxed rule-based algorithm used a more flexible approach characterized by its inclusiveness in considering a variety of factors that could have indicated the presence of the condition, which could have resulted in a larger number of identified cases.For example, in the strict rule-based criteria of the Thai SPICT-LIS algorithm, only 'significant weight loss' was included, while in the relaxed rule-based approach, additional keywords such as 'weight loss, ' 'underweight, ' 'hyposthenic build, ' and 'thinner' were also considered alongside significant weight loss.

Outcome measurements
The main outcome was done to find the algorithm correctly identified patients who might benefit from palliative care as indicated by the SPICT-LIS.The instrument was originally back-translated into Thai by Sripaew et al., following the WHO guidelines for the systematic adaptation of tools, and was then found to provide consistent responses with good agreement among general practitioners, with a Fleiss-Kappa of 0.93 (0.76-1.00) [7].The six indicators of the Thai SPICT-LIS are as follows: Indicator 1: performance status is poor or deteriorating, best available treatment has limited effect; Indicator 2: depends on others for care due to increasing physical and/or mental health problems; Indicator 3: the individual's carer requires more help and support; Indicator 4: the individual experienced significant weight loss over the last few months or remains underweight; Indicator 5: persistent symptoms despite receiving the best available treatment for underlying condition(s) and is unable to access treatment; and Indicator 6: the individual (or family) asks for palliative care and chooses to reduce, stop, or not have treatment or wishes to focus on quality of life.Patients who would possibly benefit from palliative care were those who met at least two general indicators and at least one clinical indicator [7,16,17].Patients who met these same criteria were defined as "should be offer palliative care as they could benefit from it" and the others were defined as "should not be offered" palliative care.donot meet the indicators for being offered palliative care at this time.

Inter-rater reliability
Percentage agreement and kappa statistics were used to measure the inter-rater reliability between the physicians and the strict and relaxed rule-based algorithms.Cohen's kappa was interpreted as follows: a value above 0.7 indicates good agreement; values between 0.4 and 0.7 indicate moderate agreement; and values below 0.4 indicate poor agreement [18].

Prevalence and factor association
The number of patients with cancer who should be given palliative care as they will be benefit from it was compared between palliative care specialists and both strict and relaxed rule-based algorithms.Descriptive statistics and Fisher's exact tests were used to compare the characteristics of patients with cancer who should be given palliative care with those of patients who should not.Factors associated with patients with cancer who required palliative care were assessed using Fisher's exact test and multiple logistic regression analysis.Multiple logistic regression analysis to assess the factors associated with requiring palliative care including age, sex, cancer type, cancer stage, and patient symptoms such as pain, dyspnea, anorexia, edema, dysphagia, ascites, and xerostomia.A p-value of less than 0.05 was considered statistically significant.

Results
A total of 18,203 patients were enrolled in the Cancer Registry of Songklanagarind Hospital during the study period, of whom 2,448 patients whose admission dates preceded their cancer diagnosis dates were excluded.Additionally, 765 patients whose doctors' notes contained fewer than 1,000 words, and 585 patients diagnosed with benign masses or masses of unknown behavior were also excluded.The final analysis included a total of 14,363 patients as presented in Fig. 1.
In the training dataset, comprising the admission notes of 100 patients, the comparison between rule-based and human assessment by palliative care physicians showed the proportion of patients meeting the SPICT-LIS criteria.The strict rule-based algorithm showed a high percentage agreement of 95% with the trained physicians, with a Cohen's kappa coefficient of 0.83 (0.67-0.99), indicating strong concordance.While the relaxed rule-based algorithm showed a percentage agreement of 71% and a Cohen's kappa coefficient of 0.16 (0.02-0.30), indicating lower agreement levels, as detailed in Table S2.
From 2016 to 2020, 14,363 cancer patients, met the study criteria.Approximately 65% of the patients were aged < 65 years.The years with male-to-female ratio was approximately 1:1.The most common types of cancer were gastrointestinal and gynecological cancers, followed by breast cancer.Approximately 45% of the patients had stage 3 or 4 cancer at the time of diagnosis (Table 1).
Table 2 presents the number of patients with cancer who met the criteria for each indicator classified between the relaxed rule-based algorithm and the strict rule-based algorithm.Regarding the number of patients with cancer who could possibly benefit from palliative care according to these different algorithms, of the 14,363 identified study cancer patients in Songklanagarind Hospital, the strict rule-based algorithm resulted in 11.1per 100 hospitalized patients with cancer, while the relaxed rule-based algorithm resulted in 22.9 per 100 hospitalized patients with cancer.
Univariate analysis indicated that the number of patients with cancer eligible for palliative care increased with increasing age, higher cancer stage and certain specific sites of primary cancer, and symptoms such as pain, dyspnea, edema, anorexia, xerostomia, delirium, ascites, and dysphagia, which were associated with higher odds of patients with cancer like to benefit from palliative care (see table S3).
The investigation into factors associated with a high likelihood of requiring palliative care utilized two different algorithms: a relaxed rule-based algorithm and a strict rule-based algorithm.In the relaxed approach, sex (OR = 1.25, 95% CI: 1.08-1.43),specific cancer sites, cancer stages, and various symptoms were associated with higher probabilities of the patient with cancer who met SPICT-LIS criteria Notably, the agreement rate between relaxed rule-based and human assessment by palliative care physician was 71%, indicating moderate concordance.
On the other hand, the strict rule-based algorithm found older age (OR = 1.18,95%CI: 1.02-1.37),specific cancer sites, cancer stages, and symptoms, resulting in a high agreement rate between the strict rule-based algorithm and human assessment by palliative care physicians of 95% as shown in Table 3.

The main findings
This study used a rule-based algorithm based on text tokenization and sentiment analysis to identify patients with cancer who should be given palliative care as they will benefit from it according to the SPICT-LIS criteria.Due to healthcare resource limitations, especially in palliative care, in low-and middle-income [19] countries such as Thailand [20], the understanding of the care process and access to palliative care remains limited [21].From this study, the two rule-based algorithms of the SPICT-LIS into electronic medical records or hospital information systems will assist physicians in the early detection of patients who may benefit from palliative care services.The results of the study highlight the potential use and effectiveness of rule-based algorithms for identifying palliative care cases.
The study designed two rule-based algorithms to identify the key SPICT-LIS indicators in medical records.This algorithm was developed focusing on patients with cancer.The rules were formulated based on clinical guidelines and expert knowledge, allowing the algorithm to recognize related terms and phrases related to the SPICT-LIS criteria [7].
These findings have significant implications for healthcare providers and researchers.Rule-based algorithms show a promising ability to assist in identifying patients who may benefit from palliative care.Therefore, it is a ) Mesothelial and soft tissue 258 (   potential tool for improving patient access to palliative care.

Relaxed and strict rule-based algorithms
Two approaches relaxed and strict rule-based algorithms, were implemented and compared.This comparison aimed to assess the accuracy and efficiency of a rulebased algorithm for evaluating palliative care candidates among patients with cancer.The strict approach applied stringent criteria to identify patients who met at least one of the SPICT-LIS indicators.The results showed that the strict rule-based algorithm demonstrated a high degree of specificity in identifying cancer patients who might benefit from palliative care, or in other words, showed a low false positive rate.However, this study acknowledged concerns regarding potentially missed cases that may occur with the strict rule-based criteria, and therefore a relaxed rule-based algorithm that identified a broader range of cases, but with a certain false-positive rate, may be more appropriate for use as a primary screening tool.Specifically, if the relaxed rule-based algorithm identifies a patient as positive, while the strict rule-based algorithm does not, a physician should be involved to evaluate these patients to ensure the coverage of all patients who may benefit from palliative care services.

The factors associated with patients who might benefit from palliative care
The factors associated with patients who might benefit from palliative care, as determined by the rule-based algorithm, included advanced-stage cancer and symptoms such as pain, dyspnea, edema, delirium, xerostomia, and anorexia.Most of the associated factors were consistent with previous studies, in which cases who should receive palliative care were determined based on physician judgment [22,23].Advanced-stage cancer has been reported as a significant predictor of patients requiring palliative care in various studies [24].Patients diagnosed at later stages of cancer often experience more severe symptoms and complications, making them potential candidates for palliative interventions [25].Pain is a prominent concern in palliative care utilization [26].Cancer-related pain can be debilitating, making effective pain management an important aspect of palliative care [27].Dyspnea and other symptoms are often observed in patients with advanced cancer and are important indicators for receiving palliative care [28,29].Additionally, edema [30], delirium [31], xerostomia, and anorexia [29] are also positive predictors of palliative care utilization in patients with cancer.These symptoms contribute to the complex symptom burden that palliative care aims to alleviate, emphasizing the importance of early and comprehensive symptom assessments for appropriate palliative care interventions [29,32].

Limitations
This study had several limitations.First, there is wide variability in clinical documentation and terminology found in medical records.Rule-based algorithms can make them less adaptable to the diverse languages and documentation practices prevalent in various healthcare institutions.Second, the accuracy of the algorithm results may be influenced by the quality and completeness of the medical records; incorrect or incomplete information can result in false positives and negatives.

Suggestions
To improve the application of text mining and rule-based algorithms in palliative care identification, several key future directions should be explored.First, algorithm rules should be refined to accommodate diverse clinical contexts and terminologies in collaboration with healthcare professionals to improve accuracy.Second, the development of healthcare systems in low-to-middleincome countries is technologically limited, to improve the situation, we strongly support the implementation of a country-wide health information system should be developed.The use of algorithms such as described above could facilitate the improved use of health records in identifying people who might benefit from palliative care, or other emerging treatments.
Future research with the large dataset should focus on more advance Natural Language Processing techniques including the uses of deep Bidirectional Encoder Representations from Transformers or generative artificial intelligence for more accurate classification of patients who might be benefit from palliative care.

Conclusion
This study demonstrated the potential of rule-based algorithms and text-mining techniques using medical records in identifying patients with cancer who will benefit from palliative care based on the SPICT-LIS criteria.This approach offers a promising solution to improve the timeliness and accuracy of palliative care case identification.

Table 2
Comparison of patients meeting one or more SPICT-LIS criteria by the text-mining algorithms

Table 3
Multiple logistic regression analysis for the prevalence of study cancer patients meeting the SPICT-LIS criteria